arXiv:1503.08639vl [cs.LG] 30 Mar 2015 


Sparse plus low-rank autoregressive identification in 
neuroimaging time series* 
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Abstract —This paper considers the problem of identifying 
multivariate autoregressive (AR) sparse plus low-rank graphical 
models. Based on the corresponding problem formulation 
recently presented, we use the alternating direction method 
of multipliers (ADMM) to efficiently solve it and scale it to 
sizes encountered in neuroimaging applications. We apply this 
decomposition on synthetic and real neuroimaging datasets with 
a specific focus on the information encoded in the low-rank 
structure of our model. In particular, we illustrate that this in¬ 
formation captures the spatio-temporal structure of the original 
data, generalizing classical component analysis approaches. 

1. Introduction 

Identifying the links between the variables in multivariate 
datasets is a fundamental and recurrent problem in many 
engineering applications. To this end, the use of graphs espe¬ 
cially in neuroimaging applications has become very popular 
because they allow to study and represent the interactions 
between variables in a concise manner [1]. 

A particular class of graphs, called graphical models, 
encodes information about dependence between the vari¬ 
ables conditioned on all the other variables, or conditional 
dependence [2]. For static models this information is con¬ 
tained in the inverse of the covariance matrix also known 
as the precision matrix, and can be estimated by solving 
the covariance selection problem [3]. Additionally, sparsity 
and/or low-rank structural constraints can be imposed to the 
precision matrix estimation. The sparsity constraint results 
from the parsimony principle in model fitting, i.e., one 
assumes few direct interactions between the variables, and 
is obtained through /I-norm regularizers [4]. The low-rank 
structure, induced through nuclear-norm regularizers, models 
the presence of latent variables that are not observed but 
generate a common behavior in all the observed variables 
[5]. The low-rank modeling is inspired from what is done in 
classical component analysis techniques, and leads to models 
that are simpler and more interpretable [6], [7]. An example 
of graphical model is given in Fig. 1 

*This paper presents research results of the Belgian Network DYSCO 
(Dynamical Systems, Control, and Optimization), funded by the Interuni¬ 
versity Attraction Poles Program, initiated by the Science Policy Office of 
the Belgian State. The scientific responsibility rests with its authors. BM 
is supported as an FRS-FNRS research fellow (Belgian Fund for Scientific 
Research). 

^RL, BM and RS are with the Department of Electrical Engi¬ 
neering and Computer Science, University of Liege, Liege, Belgium. 
{R.Liegeois,B.Mishra}@ulg.ac.be 

^MZ is with the Department of Information Engineering, University of 
Padova, 35131 Padova, Italy. ZorziMat@dei.unipd.it 

^RS is with the Department of Engineering, Trumpington Street, 
University of Cambridge, Cambridge CB2 IPZ, United Kingdom. 
R.Sepulchre@eng.cam.ac.uk 



Fig. 1. Ten observed variables Xi,i G [1... 10] with few interactions 
among them (sparsity) and one latent variable xn (low-rank structure). 


In dynamical models, the additional information of the 
ordering of the data is taken into account and datasets are 
seen as time series. A widely used class of models encoding 
this information are autoregressive (AR) models which are 
characterized by their power spectral density, the dynamic 
equivalent of the covariance matrix [8]. As in the static 
case, it has been shown that a zero in the inverse power 
spectral density corresponds to conditional independence 
between two variables [9]. In the dynamic case the (inverse) 
power spectral density is encoded in a block Toeplitz matrix. 
Because of this particular structure the classical /1-norm can 
not be used to induce sparsity in the inverse power spectral 
density. This problem is solved by introducing an alternate 
regularization proposed in [10]. Finally, [11] presents a uni¬ 
fying framework allowing sparse plus low-rank identification 
of inverse power spectral densities in multivariate time series. 

In this paper we adapt the problem formulated in [11] 
to the alternating direction method of multipliers (ADMM) 
framework of [12] in order to scale it with larger datasets for 
which the CVX Matlab toolbox of [13] is computationally 
expensive. In particular, we exploit separability of constraints 
of the ADMM framework to decouple the sparsity and 
the low-rank constraints. The first update is a projective 
gradient update similar to the one proposed in [10] and the 
second update is a well known projection onto the cone of 
positive semidefinite matrices. In the numerical examples 
we show the efficacy of our proposed algorithm on a real 
neuroimaging dataset. We also provide further insight into 
the information encoded in the low-rank structure of our 
model by applying the proposed algorithm to datasets with 
different spatio-temporal structures, which is shown to be at 
least partially recovered in the latent variables. 

The paper is organized as follows. We present the op¬ 
timization problem leading to this sparse plus low-rank 
decomposition in Section II and we explain how we use 





ADMM to efficiently solve it in Section III. We then show 
the results of our approach on synthetic and real data in 
Section IV and conclude. 


between the manifest variables and A = is low 

rank. Since x is an AR process of order p, we can assume that 
H and A belong to the family of matrix pseudo-polynomials 


II. Problem formulation 

We first introduce some basic notions, explain the moti¬ 
vation of the sparse plus low-rank (S-fL) graphical models 
and then formally deduce the corresponding optimization 
problem. Finally, we define latent components that we use in 
the numerical examples in order to characterize information 
encoded in the low-rank part of this decomposition. 

Consider a g^-dimensional autoregressive (AR) gaussian 
process x = [xi{t )... Xq{t)Y of order p 

p 

x{t) = '^Aix{t -i) + e(f), 

i=l 

where x{t) G W, Ai G i = l...p and e{t) 

is white gaussian noise with covariance matrix S. x is 
completely characterized by its spectral density which 

encodes information about dependence relations between the 
q variables [8]. On the contrary the inverse power spectral 
density encodes conditional dependence relations 

between variables [9], [14]. That is, two variables Xk and xi 
are independent, conditionally on the other q — 2 variables 
of X over {t G Z}, if and only if 

V0e[O,27r], (1) 

The nodes of the corresponding graphical model are the q 
variables of x and there is no edge between the two nodes 
Xk and xi if and only if Q is satisfied. 

A. S+L graphical models 

Assume that x(t) = [(x’^(t))^ where x'^{t) = 

[xi{t).. .Xm{ty\^ ^ contains manifest variables, 

that is variables accessible to observations, and x\t) = 
[xm-\-i{t)... Xm+iit)]"^ ^ contains latent variables, not 
accessible to observations. The power spectrum of x can be 
expressed using the following block decomposition 
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where the * denotes the conjugate transpose operation. 

In order to better characterize the conditional dependence 
relations between the manifest variables, from ^ we obtain 
the following decomposition of using the Schur com¬ 
plement [15] 

. (3) 

The main modeling assumption here is that I m and 
that the conditional dependencies relations among the m 
manifest variables encoded in can be largely explained 
through few latent variables. The corresponding graphical 
model has few edges between the manifest variables and 
few latent nodes, as in Fig. 1. This leads to a S+L structure 
for following H — A, where S = is 

sparse because it encodes conditional dependence relations 


Qm,p = { ^ 

j=-p 

Following [11] we further rewrite Z and A as 


S-A = AAA*, 
A = ALA*, 


(4) 


where A is a shift operator A(e*^) := [/ e'^^I ... 
and X and L are now matrices belonging to (5m(p+i) which 
is the set of symmetric matrices of size m(p+l)xm(p+l). 

Finally, Mm,p is the vector space of matrices W := 
... Wp] with VFo G Qm and Wi...Wp e 
The linear mapping T : Mm,p Qm(p+i) outputs a 
symmetric block Toeplitz matrix from the blocks of W as 
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Wi 

■ ■ ■ W^p\ 

T{W) = 


m) 
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The adjoint operator of T is the linear mapping V : 
Qm(p+ 1 ) ^ defined for a matrix A G Qm(p+i) 

partitioned in square blocks of size m x m sls 


X = 


fXoo Xoi 
X^o ^11 




(5) 


\xl xl ■■■ x,p 

Following this partition, W = V{X) G ^ is given by 


r tyo(x) = 

I WjiX) = 2YX'JoXKh+j Vj = i...p. 


B. S+L identification problem formulation 

Assume that we have a finite length realization 
x'^{l )... x'^{N) of the manifest process x'^. It should be 
emphasized that no data is available regarding the latent 
process x\ and its dimension is not even known. Our 
goal is to recover the S+L model defined in the previous 
section which best explains the collected data. Therefore, 
one estimate of A and L (hence of S and A) is given by 
solving the regularized maximum entropy problem [16], [17] 
for which the primal is 


min — log det Aqo + (C, A) 

X,LGQrn,(p+l) 

-I- X-fh{V{X + L)) -f A trace(L) (7) 
subject to A A 0, Aqo 0, L A 0, 
where 

• 7 > 0 and A > 0 are weighting parameters leading to 
a sparse H and a low-rank A, 

• C = T{R) where R G Mm,p are the p + 1 first sample 
covariance lags Rk [8], 

• (.,.) is the inner product associated with 










• h is the following function chosen to encourage a 
structured sparse solution V{X + 1/) 

^(^) = y!max{|(yo)iil, max |(Yfc)y| max \(Yk)ji\}, 

^' k=l,...,p k=l,...,p 

j>i 

which is convex but non smooth. This limitation was con¬ 
toured in similar works [10] by solving the corresponding 
dual problem. In the present case, it can be shown based on 
[10] and [11] that the dual of ([7]) is 

max ^(C + T(Z)) 
zeMm,p 

subject to ELo(l(-^fe)iil + \i^k)ji\) < 7^, j 
diag(E) = 0, A: = 0,...,p 
XI + T{z) y 0, 

where : (5m(p+i) ^ is defined as 

^-(y) = -logdet(yo0 - VEp,0E7p'l:pV^l:p.0) " TU 
and V G Qm(p+i) is partitioned as in ([^. 

C. Definition of latent components 

The optimal primal variables are recovered 

from the optimal solution following [11] from which 
Y^opt ^opt computed by Q. is a square matrix 
function of size m, defined over the unit circle and for which 
we consider the pointwise singular value decomposition 
since is low-rank for all G [0, 27r] 

AXrn{0)=r^AmUm*rn,l{e). (9) 

The i-th column of contains the strength of the 

conditional dependence relation between the i-th unobserved 
latent variable and each of the m manifest variables. Fol¬ 
lowing component analysis nomenclature [6] we call the i- 
th column of the i-th latent component and denote 

it In other words, with i G [1... /] and j G 

[1... m] represents the weight of the conditional dependence 
between the latent variable and the manifest variable 

Xj{t) in the expression of Xj{t). In the static case, 7vW 
reduces to a constant vector because A = / in 0 and A 
does not depend on 6>. On the contrary, in the dynamic case 
each latent component is a function of 0 e [0, 27r]. 

III. Alternating direction method oe multipliers 

We use the alternating direction method of multipliers 
(ADMM) of [12] to solve This choice is motivated 
by the fact that this algorithm both inherits the strong 
convergence properties of the method of multipliers and 
exploits decomposability of the dual problem formulation 
leading to efficient partial updates of the variables. We show 
how we rewrite 0 in the ADMM format by separating 
the sparse and low-rank constraints, then explain how we 
choose an adequate stopping criterion and recover the primal 
variables. 


In order to decouple the constraints related to sparsity and 
low-rank we introduce the new variable Y G Qm(v-\-i) 
reformulate ^ as 

min ^(C + nZ)) 

Z 

subject to A{Z) < b, 

Y = r{Z) + A/, 

y ^ 0 

where the first constraint (Ci) gathers the first two constraints 
on Z of ([^ and A and b are defined accordingly; and (C 2 ) is 
the last constraint of ^ imposing positive semidefiniteness 
of the new variable Y. Using the augmented Lagrangian 
formulation, we introduce Lp defined by 

Lp{Z, y, M) = ^{C + T{Z)) -{M,Y- T{Z) - XI) 

+^\\Y-T{Z)-XI\\l, 

where || • ||i? is the Frobenius norm. Subsequently, the ADMM 
updates are 


II 

+ 

min Lp{Z,Y’^,M’^), 
z e Cl 

(SI) 

^ y/c+1 _ 

min Lp{Z^+^,Y,M’^), 

Y eC2 

(S2) 

= 

Mk _ p(yfe+i _ - XI). 

(S3) 


It should be noted that (SI) has no closed form solution 
and corresponds to the sparsity set of constraints of [10]. We 
approximate the solution by a projective gradient step as in 
[10]. Following this approach, 

Z'^+^ =Ilc,{Z'^ -tk^zLp{Zk,Yk,Mk)), 

where 

. VzLpiZk, Yk, Mk) = 

+pViTiZ’‘)+XI-Y>=), 

• tk is found from the Armijo conditions, 

• Uci is the projection onto Ci which reduces to a 
projection onto the /i-norm ball [10], [18]. 

The optimization problem (S2) has a closed form solution 
and is computed as 

yfe+i = iIcAm^ - T(Z'^+^) - XI), 

P 

where Ilc 2 is the projection onto the cone of symmetric 
positive semidefinite matrices of size m(p+l) xm(p+l), 
which is done by selecting the eigenvectors corresponding 
to positive eigenvalues. This leads to the final updates of 
the ADMM algorithm. 


ADMM for sparse plus low-rank inverse power spectral 
density estimation. Initialize Zq, Yq, Mq; set p > 0; and 
successively update variables as follows: 

= nc,{Z^-tu^zLpiZk, Yu, Mk)), 
yfe+i= -T{Z'^+^)-XI), (11) 

- p{Y'^+^-T{Z’^+^)-XI). 

Following [12], a stopping criterion for is based on the 
primal and dual residuals r and s that respectively measure 


(Cl) (10) 

(C2), 





satisfaction of the equality constraint of ( p^ and the distance 
between two successive iterates of the additional variable Y. 
r and s should satisfy ||r||i? < and \\s\\f < where 
^pri ^duai defined as 

+e-'max{Av/m(p + l), \\T{Z’^)\\f, ||r''||f }, 

= mv/^Tle“'’"+e''"'||P(M'=)||F. 

Here and are the predefined absolute and relative 
tolerances for the problem. 

A variation is obtained when p is multiplied by a factor 
of r > 1 at each iteration up to a maximum value pmax 
starting from a value po depending on the application. 

Convergence analysis of the ADMM algorithm follows 
from [12, Section 3.2][19]. The computational cost per 
iteration of the updates depends on the projections 
onto Cl and C 2 , the gradient evaluation of Lp, and the 
linear mappings T and V leading to a final complexity 
0(m3(p+l)3). 

IV. Numerical examples 

In this section we apply the proposed ADMM algorithm 
to solve the sparse plus low-rank decomposition on synthetic 
and real datasets and explore the type of information encoded 
in the identified latent components. The Matlab code for 
the algorithm is available from the webpage http : / /www. 
monteflore.ulg.ac.be/-rliegeois/ 

A. Application on linear synthetic data 

This synthetic dataset consists of time series corresponding 
to a first order AR model (dynamic model, p = 1) with 
the interaction graph presented in Fig. The interaction 
graphs of the manifest variables (support of S) identified 
for different values of A and A 7 are represented in Fig. 
as well as /, the number of latent components (rank of A) 
that were identified. In order to discriminate between models 
we compute a score function /, defined in [ 10 ], taking into 
account fitting to the data and complexity of the model. 


(B)True interaction 
graph 



(C) Optimal interaction 
graph in static case 
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Fig. 2. (A) Interaction graphs of estimated models for different values of 

A and A7 using a first order AR model (p = 1). (B) True interaction graph. 
(C) Optimal interaction graph of estimated model obtained in the static case 

(P = 0). 


As expected, higher values of A promote models with less 
latent components, and higher values of A 7 favor models 
with few interactions between the manifest variables. The 
model with the best (lowest) score function recovers the 
true interaction graph with the correct number of latent 
components (Fig. [^). The optimal interaction graph in the 
static case (Fig.[^), on the contrary, does not recover exactly 
the true interaction graph. 

The stopping criterion based on the primal and dual 
residuals is illustrated in Fig. using this dataset. The 
algorithm stops when ||r ||2 < and || 5||2 < are 

both satisfied. 


Evolution of primal and dual residuals 




Fig. 3. Typical convergence scheme of the ADMM algorithm obtained 
with the linear synthetic dataset with the parameter values pQ = 1, t = 1.1, 
= 1000, = 10-5, and = 10-^. 

B. Application on non-linear synthetic data 

The interest in considering a non-linear generative model 
is that it can produce endogenous sustained oscillations in 
networks similarly to what is observed in neuroimaging data 
such as fMRI time series. A popular non-linear model is the 
Hopfield model widely used in neural networks [20]: 

Xi = -Xi + sat(^ GijXj) + e, 
jAi 

where Xi denotes the level of activity of the variable i, 
Gij is the strength of the connection from Xj to Xi, and 
sat is a sigmoidal saturation function. Fig. shows how 
we generate oscillations in two clusters. In the first case 
(Fig. 1^) the oscillations are coupled, leading to dephased 
oscillations of the same frequency whereas in the second case 
the clusters are decoupled (Fig.|^), leading to oscillations of 
different frequencies in the two clusters. We finally generate 
three different datasets for each configuration by sampling 
these time series at different frequencies. The first dataset is 
produced by using the original time series (no sampling), the 
second and third datasets are obtained by sampling the time 
series with a period of T1 and T2 to generate synthetic data 
with higher frequency content. 

Fig. shows the static and dynamic latent components 
as defined in section |II-C| that are identified in the optimal 
models from the synchronous oscillations datasets of Fig. E^- 
Since the results are very similar within the nodes of each 
cluster and for clarity purposes we plot only the value of the 
latent components in node 1 ( 7 ^, 1 ) and in node 5 ( 72 , 5 ). 





































(A) Generative Hopfield model 


(C) Weak coupling between clusters 


Time courses in nodes of two clusters with uncoupled oscillations 


Cluster 1 (4 nodes) 

^ O' / 


(B) Strong coupling between clusters 
Time courses in nodes of two clusters with coupled oscillations 



Fig. 4. (A) The generative model contains two clusters and the values of the directed connectivity between the nodes is indicated on the corresponding 
arrows. These values correspond to the matrix C in fn). (B) Set of parameters leading to coupled oscillations and (C) decoupled oscillations. 



Fig. 5. Latent components identified in the static and dynamic cases from 
the three datasets generated from coupled oscillators (Fig. SB). 


In the static case two latent components corresponding 
to the two clusters of the generative model are identified. 
There is no significant difference for the three input datasets 
suggesting that the frequency content of original data is not 
encoded in the static latent components. In the dynamic case 
(p = 1), the latent components are encoded in and a 

single latent component is identified in the optimal model. 
Interestingly, the T2-sampled synthetic data (high-frequency 
time series) leads to a latent component showing a strong 
high-frequency content whereas the original dataset leads to 
a latent component with dominant low-frequency content. 

Having the same approach on asynchronous oscillations, 
we get the results shown in Fig. In the static case we 
still obtain two latent components recovering the two clusters 
with no distinction between the three starting datasets. In the 
dynamic case {p = 1), however, the optimal model now has 
two latent components, each one capturing the oscillations 
in one of the two clusters. As in the previous case the 
frequency content of the synthetic data is encoded in the 
frequency content of the latent component. Identifying two 
latent components probably comes from the fact that the 
frequency of oscillation in the two clusters are different. 
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Fig. 6. Latent components identified in the static and dynamic cases from 
the three datasets generated from decoupled oscillators (Fig. EP) 


suggesting that the dynamic latent component identifies a 
spatio-temporal subspace of variation common to different 
manifest variables. 

C. Application to neuroimaging data 

We illustrate the application of our proposed algorithm 
on real neuroimaging data consisting of functional magnetic 
resonance imaging time series in 90 brain regions collected 
on 17 patients during rest [21]. It should be noted that 
this problem dimension is not tractable with a standard 
optimization tool such as the CVX toolbox of [13]. The 
classical approach is to use component analysis to extract 
neuronal networks. During rest, three networks are robustly 















































identified using component analysis: the visual network 
(VN), the default mode network (DMN) and the executive 
control network (ECN) that are represented in Fig [7] 


(A)VN (B)DMN (C)ECN 



Fig. 7. Three resting state neuronal networks are commonly recovered 
using component analysis: (A) the visual network, (B) the default mode 
network, and (C) the executive control network. 


such as common spectral content. Applied to a neuroimaging 
dataset, this interpretation led to a novel characterization of 
the dynamical interplay between two neuronal networks me¬ 
diating consciousness, echoing recent experimental results. 

As a future research direction we intend to explore whether 
additional information can be recovered in higher order 
dynamical latent components (p > 1). It would also be 
interesting to study the mathematical link between the latent 
components of the proposed framework and the widely used 
static principal components that also capture a ‘common 
subspace’ of variation of many variables from the covariance 
matrix instead of the precision matrix. Finally, in addition 
to using separability into sparse and low-rank constraints, it 
could be beneficial to exploit these structures in the algorithm 
updates in order to scale it to even larger problems. 


By fitting a first order AR model (p = 1) to this dataset we 
identify latent components corresponding to these networks. 
In Fig. we plot only these components for = 0, a low- 
frequency contribution called 70 = 7 ( 6 ^ = 0 ) from the 
definition of Section II-C[ and = tt, a high-frequency 
contribution called 7 ^^ = 7 ( 6 ^ = tt). Indeed, for a first order 
model the latent components are characterized by these two 
extreme values 70 and 77 ^. 



(A) Visual Network 


Fig. 8. (A) The visual network is recovered in 14 out of 17 subjects. (B) 

The DMN and ECN are coupled into a unique latent component in 12 out 
of 17 subjects. 

In 14 subjects out of 17 we observe that VN is recovered 
in one latent component, with no significant differences 
between 70 and 77 ^. On the other hand, in 12 subjects the 
DMN and ECN networks are gathered in a unique latent 
variable, the DMN corresponding to 70 and ECN to 77 ^. This 
interestingly echoes some recent results suggesting that the 
DMN and the ECN are both consciousness related processes 
and are anti-correlated, oscillating at similar frequencies [ 21 ]. 

V. Conclusion 

The contribution of our work is twofold. First, we reformu¬ 
lated the sparse plus low-rank autoregressive identification 
problem into the ADMM framework in order to scale it 
to larger datasets encountered in neuroimaging applications. 
Second, we presented a deeper exploration of the type of 
information encoded in the latent components identified in 
the low-rank part of the decomposition. Fig. [^ & [^ suggest 
that this information is richer in dynamic models than in 
static models in the sense that dynamic latent components 
recover spatio-temporal properties of the original time series 
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